We are now entering the phase of doing serious statistics: we run regression models, inspect them, plot, and tabulate them.
For this exercise, let’s again pretend to be researchers interested in the determinants of the subjective likelihood estimation of being treated for COVID-19 in a hospital. We suspect that age plays an important role, something we have already assessed by visualizing a similar relationship. So… we need the GESIS Panel COVID-19 data again:
library(dplyr)
library(haven)
library(sjlabelled)
gp_covid <-
read_sav(
"../data/ZA5667_v1-1-0.sav"
) %>%
set_na(na = c(-1:-99, 97)) %>%
mutate(
likelihood_hospital = hzcy003a,
age_cat = as.factor(age_cat)
) %>%
remove_all_labels()
We start our analysis by checking the relationship with an ANOVA. We also include some very essential covariates: gender, political orientation, and marital status.
+.
Alright, there’s definitely something in the data we should investigate. Sometimes, it is considered bad practice, but we could try to make a median cut for the dependent variable and run logistic regression.
dplyr::mutate() in combination with the ifelse() function.
Running one model is not enough. It may be that the assumptions for logistic regression are not fulfilled, or a reviewer simply doesn’t like these types of regressions. Instead, she proposes to run a binomial regression but with a cauchit link.
?family to see how you can include a cauchit link.
Ok, using different link functions is sometimes done as they provide different model fits. That’s definitely something we should investigate as well.
test = "LRT" to perform a likelihood ratio test. What’s your interpretation?